A Novel Approach to Mining Maximal Frequent Itemsets Based on Genetic Algorithm

نویسندگان

  • Mir Md. Jahangir Kabir
  • Shuxiang Xu
  • Byeong Ho Kang
  • Zongyuan Zhao
چکیده

We present a new approach based on Genetic Algorithm to generate maximal frequent itemsets from large databases. This new algorithm called GeneticMax is heuristic which mimics natural selection approaches to finding maximal frequent itemsets in an efficient way. The search strategy of this algorithm uses lexicographic tree that avoids level by level searching, which finally reduces the time required to mine maximal frequent itemsets in a linear way. Our implementation of the search strategy includes bitmap representation of the nodes in a lexicographic tree and from superset-subset relationship of the nodes it identifies frequent itemsets. Since this new algorithm uses the principles of Genetic Algorithm, it performs global search and its time complexity is less than that of other algorithms, for the reason that genetic algorithm is based on greedy approach. We separate the effect of each step of this algorithm by experimental analysis on real databases including Tic Tac Toe, Zoo, a 10000×8 Database, and so on. Our experimental results show that this approach is efficient and scalable for different sizes of itemsets. It accesses a major database to calculate a support value for fewer number of nodes to find frequent itemsets even when the search space is very large, which dramatically reduces the search time.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Maximal frequent itemset generation using segmentation approach

Finding frequent itemsets in a data source is a fundamental operation behind Association Rule Mining. Generally, many algorithms use either the bottom-up or top-down approaches for finding these frequent itemsets. When the length of frequent itemsets to be found is large, the traditional algorithms find all the frequent itemsets from 1-length to n-length, which is a difficult process. This prob...

متن کامل

Data sanitization in association rule mining based on impact factor

Data sanitization is a process that is used to promote the sharing of transactional databases among organizations and businesses, it alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. It transforms the source database into a released database so that counterparts cannot discover the sensitive patterns and so data confidentiality is preserved ag...

متن کامل

A Matrix based Maximal Frequent Itemset Mining Algorithm without Subset Creation

Frequent pattern mining is main step in association rule mining. Several algorithms have been proposed for this, but the majority of these algorithms have two main problems that is large number of database scan and generating large candidate itemsets. This process is time intense because these algorithms first mine the minimal frequent itemsets and then generate maximal frequent itemsets from m...

متن کامل

Maximal Frequent Itemsets Mining Using Database Encoding

Frequent itemsets mining is a classic problem in data mining and plays an important role in data mining research for over a decade. However, the mining of the all frequent itemsets will lead to a massive number of itemsets. Fortunately, this problem can be reduced to the mining of maximal frequent itemsets. In this paper, we propose a new method for mining maximal frequent itemsets. Our method ...

متن کامل

An Improved Mining Algorithm of Maximal Frequent Itemsets

Mining maximal frequent itemsets is very important in many data mining applications. How to improve the efficiency and effectiveness of mining algorithm has become an interesting issue in the world. In this paper, we introduce a new method to solve this problem, which is based on graph theory. Firstly, the concept of directed itemsets graph and the trifurcate linked list storage structure are p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014